Lecture � MIT MAS962 Computational semantics � Generative Lexicon 2

Greg Detre

@14:30 Tuesday, October 08, 2002

with Federica Busa

Federica, Pustejovsky, �The Generative Lexicon� 2

why did GL makes certain choices? how does it differ from WN?

what do we mean by the �lexicon�?

Push: is that the same as the question of what we consider linguistic and non-linguistic?

GL has made the choice of looking at sentences to help make the generalisations

you can�t say �a good cloud� as easily as �a good chair� � presumably, Pustejovsky would say this is because the �cloud� doesn�t carry the telic information that you need for the modifier �good� to attach to � that is, in order for �good cloud� to make sense, you have to construct a virtual cloud type within a context, right???

Deb: are we trying to create a type system of discrete types based on a softer/continuous underlying system?

surely not, because the creation of virtual types is dynamic � in the same way that �audience� is a word that comes into being on the fly

actually, she thinks the case of �audience� is a bit different, because an audience�s persistence is based on its referents, whereas a cloud�s referents are permanent (whether or not I�m there)

Deb: can things be a little bit complex?

what is complex???

= �(dot objects) to model objects with multiple and interdependent denotations� (Pustejovsky)

in Wordnet: synset � one semantic node with many lexical items

qualia structure � internal syntax for structuring concepts

they admit that their particular structure is contingent � (so how do they defend it???)

Deb: mental vs ideal?

you�d have to ask Pustejovsky � she�s not convinced about that

functional types � can be recursively generated, by embedding a simple type in larger numbers of qualia

Deb: is there anywhere to encode temporal or modal logic?

GL says it�s not the job of the lexicon to tell you which interpretation, only to generate the possibilities

SIMPLE was the project of defining the language-independent knowledge representation for 12 European languages

demonstrates how GL would differentiate between all the different senses of a word using the different qualia parameters

Deb: the vast proportion of words that kids use early on don�t fit into the GL tripartite (events/objects/qualities) schema

other words that GL has trouble with: yes/no, more, I, and/or, quantifiers etc.

Miguel: perhaps they�re in the grammar� or in the internal structure of the knowledge base � they�re operational. they�re closed.

what does it mean to say that they�re �closed�???

apparently the pronouns are a closed grammatical system

difficult to say whether prepositions closed or open?

in GL, a lot of these would be operators � Federica doesn�t think you�d want these in your type system

e.g. you�d fit �want� into the event hierarchy (with telic and other qualia bits and bobs(???))

type composition

the �generative� in GL arises from the compositionality of the qualia relations

Deb: if there are only 10ish operators, does that mean that there are only 10 possible interpretations per word? no, there�s as many as you have parameters�(???)

Push: isn�t this how linguistic analysis has always been done?

at least, in the AI community, this method of concept-formation is not new

Federica thinks it�s certainly an improvement from the huge enumerations you get in the linguistic community

she says she�s not aware of any system that�s done this on a very large scale in the AI community

Peter: how did you test coverage?

they had an algorithm which went through huge amounts of text, and it would try and pull out everything that it thought was a compound � then they would look over it afterwards

example: they were working on a corpus based on travel books � dynamically generated categories, e.g. �French villages�, �river villages� from �charming towns in Europe�

the rules for compounding and for nouns-preposition-nouns are exactly the same �towns in Europe�

started with a corpus of 200-300,000 book descriptions � it went wrong when that went up to 2.5m

that included an awful lot of gumph though that you�d want to filter out first

is this intended to fully describe a concept, or simply distinguish it??? I think it�s intended as a full description � but GL amounts to little more than word association without meaning � same problem as Quillian�s semantic nets

Deb: this is the same as the original question, �what needs to be in a lexicon?�

she doesn�t think it that the lexicon entry �book� needs much more than that it�s telic, and got some genre information etc. � that�s all you need for your type system � then you need the composition rule, detailing the operations you perform on �book� in terms of deployment � she thinks as a linguist, rather than in terms of information retrieval � the lexicon provides the prototypes

how you apply the four levels of representation to discourse models�(???)

important, given the state of current language technology, you need to have a well-bounded problem

Push: it�s easy to see other applications for which this won�t be enough, e.g. �I want a book for my four year old daughter?� � that won�t be in the lexicon

top ontology for EuroWordNet

1stOrderEntity/2ndOrderEntity

I think she said it�s in line with the GL, in that it has qualia-like structuring at the top level (function/composition/origin/form)

Deb: the 1st vs 2nd order split in EuroWordnet is quite essential

she says that their 2nd order hierarchy is really the GL event hierarchy

in Lexeme: had the prepositions fairly high up

EuroWordNet � pretty much completed, she thinks

Deb: does that means it�s dead?

it might just be the way that European funding works, in terms of finite/fixed-term deliverables

� metatags for the synset

original question: what should be in the lexicon?

knowledge that has syntactic consequences�???

Deb: perhaps you can�t draw a line between what�s lexical and what�s not

so, the lexicon is everything, in that it does know what sort of books to buy for a 4-year old

very idiosyncratic set of knowledge + experience, but you still wouldn�t want to exclude any of it from what he�d view of as the lexicon

it certainly seems implausible to me that you can draw any real/meaningful/clean distinction in the brain between lexicon etc.

to what extent do I think that grammar should be contained in the lexicon???

Push: can�t do meaning purely compositionally, e.g. �difficult�

I would want to define/rest the concept of �difficult� upon effort, which is a physical sensation� right???

in order to try and infer qualia from some corpus, you have to start from knowing something

apparently, e.g. prepositions are very useful to know in English

Deb: asked Push about the OpenMind data collection � how do you categorise it? do they correspond to the GL categories?

they went in with a certain preconceived idea of what they�d want to know

Hugo is working on turning the OpenMind database into a Wordnet-like lexical database

what would be really cool would be have some means of converting the OpenMind database into a lexical database dynamically according to your chosen parameters for the day, so you could have a 30-type top level, or a binary tree, or whatever you felt like, to see which worked best at answering queries and fitting intuitions, eh???

except, this is more or less the problem of AI L what does this actually mean/how would you actually go about it???

Admin

no class next week (15th October)

then reading the Brooks + Cantwell for 22nd October

future papers � looking at grounding and non-symbolic�

will the presentations be on-line? depends on the individuals who wrote them

Projects

what you�re doing, how you�re doing it, what you�re hoping to achieve

prefer individual

Questions

can the GL be flexible about having multiple ways of cutting up the world (as opposed to e.g. entities/actions/qualities)???

I�m pretty sure it can�t � hmmm, having said that, the only commitments it really seems to make are down to the (still very high) level of natural/functional/complex � presumably, SIMPLE made its own ontological commitments within the framework of GL

temporal or modal logic???

events/objects/qualities vs entities/actions/qualities???

1stOrderEntity vs 2ndOrderEntity in EuroWordNet???

look at Minsky�s 16ish top-level nodes (in Society of Mind), similar to Jackendorff�s

can we see Wordnet (bottom-up) and GL (top-down) as being opposite directions towards extensional categorisation???

how is the OpenMind database stored???